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ABSTRACT 



Ihi Si paper exainihes the use arid interpretation of the 
rarik-biserial corrcl^ as an index of item 
discrimihatidii. The advantages and disadvantages of 
this index are compared with those of alternative 
indices derived from the response of upper arid lower 
groups divided ori the basis of total test scores. 

Computational procedures and tests of statistical 
significance for the rank-biscrial correlation are 
presented. Appropriate correction for the spurious 
correlation arising from the contribution of the item 
to total scores is also provided. 
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• 1 

» - . • ' X,* • 

, . |8]cehpah (1969) has discussed the^iise and interpretation of item 
discrimination^ in tKe evaluation' of criterion-referenced tests. 

He recpmmen^^ the use of the index J), and a more general index Bi> both 
of which ^represent the difference in percent correct between upper and! 
; Ib^er group^ 6f^ the basis of total test score, 

in fthe case of D> the students are divided: aiitb two equal groups, while 
Bi 'p^its tiie use 'df any ^t^^ ffrennaii rightly points out 

thfee/:^^^^ of I)_ and Bir (1) they 

t^eafufe degree d^f diserimina^^^ to a widely 

acceptable intuitive notion^^p^^ the meaning of discrimination, (2) they 
are tasily computable and interpretable by unsophisticated users, and 
(3) ihey are distribution free, and do not require questionable assump- 
tions or hazardous approximations in their tests of significance. 

There are, however., three aspects of D and Bi which seriously 
detract from their value as measures of item discrimination. First, the^ 
dichbtomization of the total score variable discards information on 
discriminations among students in the upper group and among students in 
the lower group. This results in indices largely sensitive to discrim- 
ination in the region of the division bett^een upper and lower groups. 
As Brennan himself points out, groups used in the* evaluation of criterion 
referenced tests are rarely large, so that any substantial loss of 



At ti'jies Brennan appears to confuse defects of proposed tests of 
significance for D with defects of D as a measure of d5.scriraination. 
Since D is only a special case of Bi., any advantage claimed for Bi is 
equally true of D, and any test procedure recommended for Bi is equally 
applicable to D. 



■informatiph is strictly to be avoided. Secondly, the use of D and Bi 
r^iquires the .cvaluator to-seleci-a cutoff between lower and upper groups. 
No criteria for this selection have been offered, so that even the most 
experienced evaiuator is confronted- with a serious .problem of judgment. 
tuphVrmore, since the values of" tlie Bi indices are markedly affected by 
4fie:.cutoff decision, the comparability of -.the Ji from one test to 

■iinpther is impaired. ' ' • ■ . - . . 

Finally, there a. third difficulty which i? not unique to D and Bi, 
•bu| which is.^^^^^^ inos.e-.iridi'ces o£ . xnat diffi- ' 

-cyity is thq spuriously high correlations which result from the fact that 
ttie item itself contributes to the total score. Unless a correction is 

. introduced, obtained values of D and Bi are positively biased, and the 
bias may be pronounced when only a few items contribute to total scores, . 
as, is usually the case for short criterion-referenced tests. 

A discrimination index based on rank order correlation will be ^ 
presented in the sections which follCT^. It will be shown to retain. the 
advantages of the D and Bi indices, while avoiding their defects. 

Rahk-biserial correlation 

A measure of correlation- between a ranked variable and a dichotomy 
was developed by Curetou (1956, 1968). This measure, called the rank- 
biserial correlation, rrb, is functionally analogous to the point- 
biserial r, but is closely related to Kendall's tau, being based on the 
number of agreements and disagreements in rank order be t^^een the two 
variables. For the purpose of determining correspondence beto/cen rank 
orders, the dichotomy is considered to be a categorization into two ranks 
with multiple ties (tJhitfielJ, 1947). . ' ' . 



Consider the tabulation giv^n below, where Y is the„rank variable 
Je(^for example, ranks 6f the Istudents on total score) and X is the 
aichptoiriy (X = 1 representing a correct response to a test item, X = 0 an 
inporrect response). X and Y agree in ranking any.^pair.of students when 
Jthe higher ranked on Y obtained a correct response, and the lower ranked 

. ^ , . . Y (Ranks ) 

X - 0 " 3 5 6 - 8 9 10 

X - 1 1 2 4 7 . 

obtained an incorrect response. Thus for every rank with X = 1, there 
ts an agreement for every lower rank V7hich appears with X = 0. The number 
of agreements for the ranks of 1, 2, 4, and 7 are 6, 6, 5, and 3 respec- 
tively. • 

On the other hand, a disagreement in rank occurs when the student ^ 
higher ranked on Y obtained an incorrect response, and the lower ranked 
obtained a correct response. Thus, for every rank with X = 0, there is a 
disagreement for every lower rank with X = 1» The number of disagreements 
for the ranks of 3, 5, 6, 8, 9> and 10 are 2, 1, 1, 0, 0, and 0 
respectively. 

Curetoh defined r^b as follox^s: rrb (P-Q) /P max where P is the 

♦ 

total number of agreements, Q is the total number of disagreements, and Pmax 
Is the maximum possible value of P. It should be noted that the numerator 
or rrb is the same as the numerator of Kendall* s tau, the two measures 
differing only in the denominator. In the case that no ties on Y occur 
Pmax ^ nx no, where n\ Is the number of ranlcs having X = 1, and no is 



the number of ranKs haying X = 0^ The denominator of rrb was chosen to 

Insure that the possible range of values of 1 was obtainable under all ' 

circumstances i Kehdall^s tau does not necessarily attain these limits 

;when the relationship between X and Y is perfect. -For example,' when 

=ft ? 5, ni = 2, and h2 = 3, arid the rank's 1 arid' 2 have scores of X = 1, 
' . , ^ a. ' ' • ' 20-4 

frb = 1> buf^fc s= .77.For the data above, r^fe ~ Y 4)(6) ^ '^^^ 

Glass (1965, 1966) present6d a simplified computational procedure 
^ioi "^xh 9^ "^ occur.* Rather than counting agreements 

an^ ^dis^ aboye^ it is necessary only, to compute Yi and 'Yq^^ the 

:inean ranks f 6- X = 1 and X = 0 respectively, then rrb = 2(Yo - Yi)/n. 
Powever, this' simplification will rarely apply to short criterion- 
referenced tests, particularly when the sample size is large, since many 
students will obtain identical total scores and be assigned tied ranks. 

Computation of rrb with tied ran ks 

^ A correction to Pjnax in"?t be made when ranks are tied. If there is 
, a p»rfect relationship between X and Y, agreements are lost among tied 
ranks at the point of division bet^/een the ni upper ranks and the no lower 
ranks. In the example given below, there are 6 ranks tied at the point of 
division, four having X = .1 and 2 having X = 0. 

X ^ 0 '4.5 4.5 8 9 10 

X = 1 .1 4.5 4.5 4.5 4.5 

There are (4) (2) = 8 possible agreements lost as a result of the tied 
ranks. If tl is the number of ranks with X = 1 tied at the division, and 
tp the number with X ='0, then Pmax - ninQ- tltQ.- 
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t9|?4 ?P9<^e tCorrect Cumiatlve . Incorrect Cumulative 

, Frequency Frcqueiicy- Frequency Frequency 

^iik Fi^k fO,fc Fo,Tc 

Xk-1 . fi,k-l Fi,k-1 fO,k-l Fo,k-l 

• . • I ^ I I 

• • • I • 
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;^ " • When a bi^^^^ ff^efjuency distributi^^ is available, a simple 

cbfnputat^^^ followed vliich incorporates the correction f 

1 . for* ties,, ^^^^^^^ the assignment of ranks to the Y variable, I 

The notation for the bivariate distribution shown below represents the t 

fi^equeiicy of correct and incorrect responses for each possible ^otal score, | 

|ipP8 ^'itK^c^^^ frequencies for correct and incorrect responses. 'I 



I I I I 

''i ^1/1 ^1,1 . %l Fo,i-l ' 

• • • I . I . : • 

f i I ■ . . , . , ' ? 

■ . ; -I 



*2 ^1,2 n,2 f0,2 F0,2 j 

. ^1 \l ^1,1 ^0,1 ^0,1 ^ 

Kotc, of course, that the cumulative frequencies Fi,i « ^ fi^ j and • 
' W,i ^ ^^^^ ^^^^ require a symbol Fi for the marginal 

cumulative frequency, Fi = Fi,i + Fo,i. Then the number -of agreements 
^ k ^ . 

ara r = £ fi^j FQ^i^l, and the number of disagreements Q = E fo i Fi i^i^ 

* i-^l 

To obtain Iq and t^, examine tlie inarfiinal frequencies to find F* and !' 
H"! for which Ft no F*-!. Then ti = ni - F* and to = no - F*-!. ^ 
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Since ni = Fi .j^ and np = Fpi^, tfj^^ s rijno f titp = Pi,kFo,k " 

<li,k r Fi).(Po,k - Pi-1> = Fl,kF0,k ^ Fl,kFO,k + Fl,kFi-l + Fo,kFi - 

FtFi~l = Fi,kFi-i + Fo,kFi - FiFi-i. Thus rrb becomes, in frequency 
distribution notation, *. . 



i 



rrb 



<fl,iF0,i^l - fO,iFl,i-l) 
"^Fi^kFi,! + Fo^kFl - fM-1 



In the cxan>i>ie distribution given below P = 73, Q 
if - 12, pf-i = 8,. Fi,jk 12, and Fo^k .8, 



Thus rrb 



73-20 



(12) (8) + 8(12) - (8) (12) 96 
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This value may be compared with rpb « .50 and D « .40 for the same data^ ' 
Also Bi = .63. B2 = .67. B3 = .71. B4 = .67. B5 = .58. Bg = .25. By = .27^. 
28 " .47, B9 = .44 and Bio " .42 where the subscript refers to th<? lowest 
value of Yi included in the "upper" group. It is Interesting to note that 
the highest values of Bi occur with cutoffs below the median, whereae most 
evaluators would place the cutoff above the median in distinguishing 

* 

"acceptable" from "unacceptable" levels of performance. 

Correction for spurious correlation 

Wke other item discrimination indices, rrb will be subject to 
spurious correlation arising from the contribution of the item to the 
total score, if the computational procedure given above is folloi/ed. 



llowcycri the formula for the fre4uency distribution computation Is 
easily inodlflcd to eliminate the bias due to suprlous correlation. 
Since the total score Is increased by one for those who have a correct 
response on the <tcin, the same computation procedure may be followed If 
the total score Is simply reduced by one for all students having a 
• correct response. In terms of the frequency distribution, the reduction 
•imply reqi.irei-. each frequency and cumulative frequencies In the columns 
for, correct response to be shifted doxm to the next lower score, and the 
computation of a new set of marginal cumulative frequencies. If this is 
donci the formulas (still using the original notation) become: 

^ "Jl ^l»i^0»i-2 ""'^ Q f0,lPl,l . 

- <1.2 + »'0,i-l><^l.l.l + ^0,l) 

Where t*^^ + ^J^i.i ^ ..^ ^ F^,!.! + pj^i 

For the example above 

P,= 8+ 8+ 7 + 6+12+l2+4 = $7. 
Q e 9 + 8 + 6 r 23 
^Ui^l + fJ^- = 6 + 6 = 12 
F*,t-2 + F8,i-i = 2 + 3 « 5 

57 - 23 34 

''''^ = a2)(5) + (8)(12) - (12)(5) . ir = .35, 

in comparison with the uncorrected value r^b = .55. 



the example demonstrates that the effect of spurious correlation 

can be very substantial when the number of items is small* It is 

recommended t^iat the corrected fornnila 

k , . 
r « . ill (-l,t-lF0,t-2 - fp,tFl,t) 

* 

to be used vhchevcr v^h U used as an Item discrimination index. 
yests off significance 

♦ 

.Several different approaches may be taken in testing the statistical 
significance of r^b* Cureton (1956) suggested that the Mann-Whitncy U«*test 
be used for this purpose (see Siegel, 1956)* The value oC U employed as a 
test-statistic corresponds to the smaller of the values of P or Q as 
computed above* The tables of critical values of U given in Siegel^s book 
provide exact tests so long as uq^ 20 and n^^ 20, and no ties appear in 
the ranked variable* • • • " \ 

In the case of ties, when nQ^ 8 or ni^ 8, an appropriate procedure 
is to perform an exact randomization test on P* The value of P is 
determined for each of thc^ f jj ^ niffe* ^^"^Jowi^otions of the ranks . 
between the values of the dichotomy, with the restriction, that ni ranks 
are ^ assigned to X 1 and no ranks to X « 0« For an a % test, the 
distribution of possible values of P is used to determine if O Z or less 
of the values are equal to or more extreme than the observed value of P* 
If this is the case, the observed value is declared significant. 



Except for very small values of n = no + nl , or extreme splits 
betwc^en no aiid ni , the computational labor of the exact randomization 
test' is' excessive due to the large number of values of P to be computed, 
even v/hen performed by a digital computer* . Where the cost of computer 
tite'is excessive, the only alternative available is the approximate 
">j§cklchife" tec^^ The details of a "jackknife" solution are top 

extensive to be presented here, and the reader is referred to the 
discussion by Hosteller and Tukey (1968) . 

^When nx > 8 and n2 > 8, whether or hot ties are present, a very 
satisfactory normal approximation may be employed. Under the null 
^hypothesis, P - Q will be approximately norinally distributed with mean 
|i ^ 0 and variance 



as given by Kendall (1962), where fi refers to the marginal frequency of" 

occurrence of Yi, The approximation is further improved by the 

incorporation of a correction for continuity reducing P - Q in absolute 

value • Thus the test statistic F =* - .(■? C ^ referred 

<^ P-Q 

to tables of the unit normal distribution, where C is the value of the 
correction for continuity. 

When no tied ranks are present C = 1. In other cases an approximate 
correction suggested by Kendall (1962) may be obtained from the follov/ing 
formula. Let Yh and Yi be the highest and lowest scores in the 
distribution with fh > 0 and fj > 0, respectively. Then 

^ .-^ ^h. ^1 v/here g is the number of distinct Yi v;ith fi > 0. 
2(g - 1) . ^ 



The value of C given by this fornbla is one-half of the average distance 
betwef^n adjacent possible values of P - Q. • 

. In the example above, the value of P - Q = 34, when corrected for 
spurious correlation. Values of fi through fio are 1, 1, 3, 0,. 7, 
2> 2, 2, 1, and 1, respectively, using fi = fi,i-i + fo,i. Then 

= (8/95) [ 7980 - 24 - 336 - 3(6)J 
= 8(7602) /95 =640*168 

«ri4 ?p_Q = 25.30. For the highest and lowest score = fi = 1, and. the 
number of d? >cinct scores occurring is eight, giving g - 1 - 7. Then 

C = 2(20) -1-1 = 19 ^ 2.71 
2(7) 7 

""^ " ^^Strfo^ " Istio " ^-^"^ indicating that r^b is not 

significant at cC = -OS. It should be noted that the correction for 
coutimiity is quite important in applications of r^b to tests vrith only a 
few items, as illustrated here. 

Comparison of rrb with D and Bj 

The basic nature of rrb is quite similar to D and Bi in several 
respects. All are based on the same intuitive notion of. discrimination, 
i.e. , that an item discriminates (positively) beti-7een individuals whenever 
their difference in response to the item corresponds to their difference 
in performance as based on total score. The value of rrb i« subject to 
the folloi^ing simple interpretation: rrb is an estimate of the difference 
between the probability that the rank order of tv/o randotaly selected 
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individuals on total score and item will be in agreement, and the 
probability that their rank ordier on total score and item will be in 
disagreement. The D and Bi indices are subject to exactly the same 
interpretation, except that only t^/o r;anks of performance are recognized 
.on, the basis of total score, i.e., an upper level and a lov/er level. In 
fact, the computational formulas for D and Bi are merely special cases 
of the rrb formula when the rank ordering is dichotomized. Since rrb> 
p, and Bi are based only on ordering of performance, not on arithmetic 
distances be ti^een performance levels, ail'are entirely distribution- free, 
being invariant under any monotonic transformation of the total scores. 

D and Bi are slightly easier to compute, particularly when ties are 
present in the rank order on total score. However, this computational 
simplicity is purchased at the expense of information lost as a result 
of the dichotomization of the tot-^'' score ranking. • There does not seem 
to be any logical reason that an index of discrimination should ignore t 
discriminations among students in the upper group, and among students in 
the lower group. Since rrb incorporates all possible information on 
discrimination obtainable from a rank ordering, it is to be preferred on 
that basis if no other. Furthermore, rrb avoids entirely the difficulty 
of judging an appropriate point of dichotomization of the total scores 
which is involved in D and Bi. 

The remaining advantages of rrb concern technical statistical 
properties. The dichotomization involved in D and Bi produce indices 
with greater sampling variability and tests of significance of lesser 
power -efficiency. The pov;er of the Mann-Ivhitney U used to test r^b 
is approximately 95% against normal alternatives. 



• The power-efficiency of the- median test, which corresponds to the 
test for D, is about 95% for n = 6, declining to an asymptotic value of 
63% as n' increases. Thus a sample size considerably larger than that 
used with r^b is required if the test bf D is to have equivalent power, 
unless the sample sizes are very small. 

Finally, D and Bi, as presented by Brennan (1969), have not been 
modified to correct for spurious correlation. This fact not only 
produces a positive bias in the reported values of the indices, but also 
invalidates the test of significance presented by Brennan. While it 
would be no more difficult to modify the computation of D and Bi and their 
test procedure than it was to modify r^b «nd its test, the general 
superiority of rrb would seem to make unnecessary the additional effort 
required to develop such modifications. 
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